Markovian Embeddings of General Random Strings
نویسنده
چکیده
Let A be a finite set and X a sequence of A-valued random variables. We do not assume any particular correlation structure between these random variables; in particular, X may be a non-Markovian sequence. An adapted embedding of X is a sequence of the form R(X1), R(X1, X2), R(X1, X2, X3), etc where R is a transformation defined over finite length sequences. In this extended abstract we characterize a wide class of adapted embeddings of X that result in a first-order homogeneous Markov chain. We show that any transformation R has a unique coarsest refinement R′ in this class such that R(X1), R (X1, X2), R (X1, X2, X3), etc is Markovian. (By refinement we mean that R′(u) = R′(v) implies R(u) = R(v), and by coarsest refinement we mean that R′ is a deterministic function of any other refinement of R in our class of transformations.) We propose a specific embedding that we denote as R which is particularly amenable for analyzing the occurrence of patterns described by regular expressions in X. A toy example of a non-Markovian sequence of 0’s and 1’s is analyzed thoroughly: discrete asymptotic distributions are established for the number of occurrences of a certain regular pattern in X1, ..., Xn as n→∞ whereas a Gaussian asymptotic distribution is shown to apply for another regular pattern.
منابع مشابه
Recognition of Strings Using Nonstationary Markovian Models: An Application in ZIP Code Recognition
This paper presents Nonstationary Markovian Models and their application to recognition of strings of tokens, such as ZIP Codes in the US mailstream. Unlike traditional approaches where digits are simply recognized in isolation, the novelty of our approach lies in the manner in which recognitions scores along with domain speciic knowledge about the frequency distribution of various combination ...
متن کاملEffect of random telegraph noise on entanglement and nonlocality of a qubit-qutrit system
We study the evolution of entanglement and nonlocality of a non-interacting qubit-qutrit system under the effect of random telegraph noise (RTN) in independent and common environments in Markovian and non-Markovian regimes. We investigate the dynamics of qubit-qutrit system for different initial states. These systems could be existed in far astronomical objects. A monotone decay of the nonlocalit...
متن کاملStudy of Random Biased d-ary Tries Model
Tries are the most popular data structure on strings. We can construct d-ary tries by using strings over an alphabet leading to d-ary tries. Throughout the paper we assume that strings stored in trie are generated by an appropriate memory less source. In this paper, with a special combinatorial approach we extend their analysis for average profiles to d-ary tries. We use this combinatorial appr...
متن کاملUniquely decodable n-gram embeddings
We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule N . We classify all ∈ N that are valid images of strings under such embeddings, as well as all whose inverse image consists of exactly 1 string (we call such uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language. © 20...
متن کاملPostprocessing of Recognized Strings Using Nonstationary Markovian Models
ÐThis paper presents Nonstationary Markovian Models and their application to recognition of strings of tokens. Domain specific knowledge is brought to bear on the application of recognizing zip Codes in the U.S. mailstream by the use of postal directory files. These files provide a wealth of information on the delivery points (mailstops) corresponding to each zip code. This data feeds into the ...
متن کامل